对图像分类器的最新基于模型的攻击压倒性地集中在单对象(即单个主体对象)图像上。与此类设置不同,我们解决了一个更实用的问题,即使用多对象(即多个主导对象)图像生成对抗性扰动,因为它们代表了大多数真实世界场景。我们的目标是设计一种攻击策略,该策略可以通过利用此类图像中固有的本地贴片差异来从此类自然场景中学习(例如,对象上的局部贴片在“人”上的局部贴片与在交通场景中的对象`自行车'之间的差异)。我们的关键想法是:为了误解对抗性的多对象图像,图像中的每个本地贴片都会使受害者分类器感到困惑。基于此,我们提出了一种新颖的生成攻击(称为局部斑块差异或LPD攻击),其中新颖的对比损失函数使用上述多对象场景特征空间的局部差异来优化扰动生成器。通过各种受害者卷积神经网络的各种实验,我们表明我们的方法在不同的白色盒子和黑色盒子设置下进行评估时,我们的方法优于基线生成攻击,具有高度可转移的扰动。
translated by 谷歌翻译
制作对抗性攻击的大多数方法都集中在具有单个主体对象的场景上(例如,来自Imagenet的图像)。另一方面,自然场景包括多个在语义上相关的主要对象。因此,探索设计攻击策略至关重要,这些攻击策略超出了在单对象场景上学习或攻击单对象受害者分类器。由于其固有的属性将扰动向未知模型的强大可传递性强,因此本文介绍了使用生成模型对多对象场景的对抗性攻击的第一种方法。为了代表输入场景中不同对象之间的关系,我们利用开源的预训练的视觉语言模型剪辑(对比语言图像 - 预训练),并动机利用语言中的编码语义来利用编码的语义空间与视觉空间一起。我们称这种攻击方法生成对抗性多对象场景攻击(GAMA)。 GAMA展示了剪辑模型作为攻击者的工具的实用性,以训练可强大的扰动发电机为多对象场景。使用联合图像文本功能来训练发电机,我们表明GAMA可以在各种攻击环境中制作有效的可转移扰动,以欺骗受害者分类器。例如,GAMA触发的错误分类比在黑框设置中的最新生成方法高出约16%,在黑框设置中,分类器体系结构和攻击者的数据分布都与受害者不同。我们的代码将很快公开提供。
translated by 谷歌翻译
近年来,图像分类器的BlackBox传输攻击已被广泛研究。相比之下,对对象探测器的转移攻击取得了很小的进展。对象探测器采用图像的整体视图,并检测一个对象(或缺乏)通常取决于场景中的其他对象。这使得这种探测器本质上的上下文感知和对抗的攻击比目标图像分类器更具挑战性。在本文中,我们提出了一种新的方法来为对象检测器生成上下文感知攻击。我们表明,通过使用对象及其相关位置的共同发生和尺寸作为上下文信息,我们可以成功地生成目标的错误分类攻击,该攻击比最先进的Blackbox对象探测器上实现更高的转移成功率。我们在帕斯卡VOC和MS Coco Datasets的各种对象探测器上测试我们的方法,与其他最先进的方法相比,性能提高了高达20美元的百分点。
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
Blackbox对抗攻击可以分为基于转移和基于查询的攻击。转移方法不需要受害模型的任何反馈,而是与基于查询的方法相比提供较低的成功率。查询攻击通常需要大量的成功查询。为了达到两种方法,最近的努力都试图将它们结合起来,但仍需要数百个查询才能获得高成功率(尤其是针对目标攻击)。在本文中,我们提出了一种通过替代集合搜索(基地)进行黑框攻击的新方法,该方法可以使用极少量的查询来生成非常成功的黑盒攻击。我们首先定义了扰动机,该机器通过在固定的替代模型上最小化加权损失函数来生成扰动的图像。为了为给定受害者模型生成攻击,我们使用扰动机产生的查询搜索损失函数中的权重。由于搜索空间的尺寸很小(与替代模型的数量相同),因此搜索需要少量查询。我们证明,与经过Imagenet训练的不同图像分类器(包括VGG-19,Densenet-121和Resnext-50)上的最新图像分类器相比,我们提出的方法的查询至少少了30倍,其查询至少少了30倍。特别是,我们的方法平均需要每张图像3个查询,以实现目标攻击的成功率超过90%,而对于非目标攻击的成功率超过99%,每个图像的1-2查询。我们的方法对Google Cloud Vision API也有效,并获得了91%的非目标攻击成功率,每张图像2.9查询。我们还表明,我们提出的方法生成的扰动是高度转移的,可以用于硬标签黑盒攻击。
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.
translated by 谷歌翻译
Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
translated by 谷歌翻译